m 



s 



m 






A GCV based Arnoldi-Tikhonov regularization method 



p. Novati, M.R. Russo 
Department of Mathematics 
Q ' University of Padua, Italy 

(N' 

April 2, 2013 



Abstract 



For the solution of linear discrete ill-posed problems, in this paper we consider the Arnoldi- 
Tikhonov method coupled with the Generalized Cross Validation for the computation of the 
■^^ ' regularization parameter at each iteration. We study the convergence behavior of the Arnoldi 

method and its properties for the approximation of the (generalized) singular values, under the 
hypothesis that Picard condition is satisfied. Numerical experiments on classical test problems 
r^ ' and on image restoration are presented. 
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(^ ! 1 Introduction 

In this paper we consider discrete ill-posed problems, 
^: Ax^h, AeM^^^, feeK", (1) 

o 

CO I in which the right-hand side b is assumed to be affected by noise, caused by measurement or dis- 

cretization errors. These systems typically arise from the discretization of linear ill-posed problem, 
such as Fredholm integral equations of the first kind with compact kernel (see e.g. [HI Chapter 1] 
I^ . for a background). A common property of these kind of problems, is that the singular values of the 

''T^ ' kernel rapidly decay and cluster near zero. In this situation, provided that the discretization which 

C^ . leads to ([1]) is consistent with the continuous problem, this property is inherited by the matrix A. 

Because of the ill conditioning of A and the presence of noise in 6, some sort of regularization is 
generally employed for solving this kind of problems. In this framework, a popular and well estab- 
lished regularization technique is the Tikhonov method, which consists in solving the minimization 
problem 

min {\\Ax - bW"^ + \^\\Lxf] , (2) 

where A > is the regularization parameter and L G M^^^ is the regularization matrix (see e.g. 
[T3] and [M] for a background). We denote the solution of ([2]) by xa. For a discussion about the 
choice of L we may quote here the recent work [P and the references therein. As well known, the 



choice of the parameter A is crucial in this setting, since it defines the amount of regularization 
one wants to impose. Many techniques have been developed to determine a suitable value for the 
regularizing parameter and we can refer to the recent papers [2ll El [TUl [19] for the state of the art, 
comparison and discussions. We remark that in ([2|) and throughout the paper, the norm used is 
always the Euclidean norm. 

Assuming that b — b+e, where b represents the unknown error-free right-hand side, in this paper 
we assume that no information is available on the error e. In such a situation, the most popular 
and established techniques for the definition of A in ([2]), as for instance the L-curve criterion and 
the Generalized Cross Validation (GCV), typically requires the computation of the GSVD of the 
matrix pair [A, L). Of course this decomposition may represents a serious computational drawback 
for large-scale problems, such as the image deblurring. In order to overcome this problem, Krylov 
projection methods such as the ones based on the Lanczos bidiagonalization [U [TTl [T71 [18] and 
the Arnoldi algorithm [3J [3T] are generally used. Pure iterative methods such as the GMRES or 
the LSQR, eventually implemented in a hybrid fashion ([14, § 6.6]) can also be considered in this 
framework. 

In this paper we analyze the Arnoldi method for the solution of ([2]) (the so called Arnoldi- 
Tikhonov method, introduced in [3]), coupled with the GCV as parameter choice rule. Similarly to 
what made in [4] for the Lanczos bidiagonalization process, we show that the resulting algorithm 
can be fruitfully used for large-scale regularization. Being based on the orthogonal projection of 
the matrix A onto the Krylov subspaces ICm{A, b) = span{6, Ab, . . . , A"^~^b}, we shall observe that 
for discrete ill-posed problems, the Arnoldi algorithm is particularly efficient for the approximation 
of the GCV curve, after a very few number of iterations. 

Indeed, under the hypothesis that Picard condition is satisfied [H], we provide some theoretical 
results about the convergence of the Arnoldi- Tikhonov methods and its properties for the approx- 
imation of the singular values of A. These properties allow us to consider approximation of the 
GCV curve which can be obtained working in small dimension (similarly to what made in [3] where 
a "projected" L-curve criterion is used). The GCV curve approximation leads to the definition of 
a sequence of regularization parameters (one for each step of the algorithm), which are fairly good 
approximation of the regularization parameter arising from the exact SVD (or GSVD). 

The paper is organized as follows. In Section 2 we present a brief outline about the Arnoldi- 
Tikhonov method for the iterative solution of ([2]). In Section 3 and 4 we provide some theoretical 
results concerning the convergence of the Arnoldi algorithm and the SVD (GSVD) approxima- 
tion. In Section 5 we explain the use the AT method with the GCV criterion. Some numerical 
experiments are presented in Section 6 and 7. 



2 The Arnoldi-Tikhonov method 

Denoting by JCm{A,b) = span{b,Ab,...,A"^~^b} the Krylov subspaces generated by A and the 
vector 6, the Arnoldi algorithm computes an orthonormal basis {wi, ..., Wm} of JCm{A, b). Setting 
Wm = [wi, ..., Wm] G M^^™, the algorithm can be written in matrix form as 

AWm = WmHm + h„i+l,mWm+iem, (3) 

where H„i = (hij) e ^rnxm jg ^^ upper Hessenberg matrix which represents the orthogonal 
projection of A onto ICm{A, b), and Cm = (0, ..., 0, 1)"^ G K™. Equivalently, the relation ([3|) can be 



written as 



where 



H„ 



AWm = Wm+lHm, 
Hm 



{m-\-\) xm 



(4) 



(5) 



In exact arithmetics the Arnoldi process terminates whenever hm+i,m — 0, which means that 
If we consider the constrained minimization 



min U\Ax-b\\^ + \^\\Lx\\^], 
xe>Cr„{A,b) *- ■' 



writing x = Wmym, Um € K™, and using ([4]), we obtain 



mm 



jll^mym - \\b\\ ei|f + A^ ||iW"mJ/m|l^| 



(6) 



(7) 



which is known as the Arnoldi-Tikhonov (AT) method. Dealing with Krylov type solvers, one 
generally hopes that a good approximation of the exact solution can be achieved for m <$^ N, 
which, in other words, means that the spectral properties of the matrix A are rapidly simulated 
by the ones of Hm- This method has been introduced in l3] in the case oi L — I^ (where Im is 
the identity matrix of order N, so that ||-/jll^m2/m|| = Il2/m||) with the basic aim of reducing the 
dimension of the original problem and to avoid the matrix-vector multiplication with A^ used by 
Lanczos type schemes (see [1] [11] and the references therein) . 

It is worth noting that ([T]) can also be interpreted as an hybrid method. Indeed, the minimiza- 
tion ^ with L — Iff is equivalent to the inner regularization of the GMRES [H]. We remark 
however, that for L ^ I^, the philosophy is completely different, since ([7]) represents the projection 
of a regularization, while the hybrid approach aims to regularize the projected problem. As we 
shall see, this difference can be appreciated more clearly whenever a parameter choice rule for A is 
adopted. 

As well known, in many applications the use of a suitable regularization operator L ^ /jv , may 
substantially improve the quality of the approximate solution with respect to the choice oi L = I^ . 
Anyway, we need to observe that with a general L G M^^^, the minimization ([T]) is equivalent to 



mm 



Hm 

\LW„ 



V-n 



l^llei 




(8) 



so that, for P Ri N, the dimension of (j8]) inherits the dimension of the original problem. Computa- 
tionally, the situation can be efficiently faced by means of the " skinny" QR factorization. Anyway, 
assuming that P < N, in order to work with reduced dimension problems, we add N — P zero 
rows to L (which does not alter (jB))) and consider the orthogonal projection of L onto /Cm (A, 6), 
that is, 

Lm :- Wl,LWm e M™^". 



This modification leads to the reduced minimization 

|||ii"m2;m - ll^ll ei||" + A^ \\Lmy,n\\ 



mm 

mm {\\Ax ^ bf + X^W^Lxf} , 



(9) 



(10) 



which is not equivalent to ([6|) anymore. Anyway, the use of Lm appears natural in this framework, 
and it is also justified by the fact that 

\\W^Lx\\ < \\Lx\\, 

since ||W^La;|| = ||W„iW^Lx|| and ||W^mT4^,^|| = 1, being W^Wm ^'^ orthogonal projection. We 
observe moreover that L,„ would be the regularization operator of the projection of a Franklin 
type regularization [B] 

{A + XL)x^ b. 



3 Convergence analysis for discrete ill-posed problems 

In what follows we denote by A = UT.V'^ e M^""^ the SVD of A where E = diag{ai, ..., ctat), and 
by Am ■= Ura^rnVm the truncated SVD. We remember that the matrix A^ := A — Am is such 
that ||A,„|| ^(jm+i- 

An important property of the methods based on orthogonal projections such as the Arnoldi 
algorithm, is the fast theoretical convergence {hm+i,m ^^ 0) if the matrix A comes from the 
discretization of operators whose spectrum is clustered around zero. Denote by Aj, j > 1 the 
eigenvalues of A and assume that |Aj| > |Aj+i| for j > 1. We have the following result (cf. [Ml 
Theorem 5.8.10]), in which we assume N arbitrarily large. 

Theorem 1 Assume that 1 ^ o-{A) and 

y ^ a^ < oo for a certain < p < 1. (11) 

Let prn{z) ^ ll"'^-^^{z - Xt) . Then 

lb™(A)||<(^)'"^', (12) 

where 

r?(p)<(l+p)E<- (13) 

Since 

TT. h,+,,, < \\pm{A)b\\ , (14) 

J- -*-«— 1 

for each monic polynomial Pm of exact degree m (see [31] p. 269]), Theorem [T] reveals that the rate 
of decay of Jli=i ^i+i,i is superlinear and depends on the p-summability of the singular values of A. 
We remark that the superlinear convergence of certain Krylov subspace methods when applied to 
linear equations involving compact operators is known in literature (see e.g. |23j and the references 
therein). The rate of convergence depends on the degree of compactness of the operator, which 
can be measured in terms of the decay of the singular values. 

Here, dealing with severely ill-posed problems, the typical situation is CTj == 0(e^"^), where 
Q > handles the degree of ill-conditioning [161 Definition 2.42]. In this situation, the following 
result expresses more clearly the fast decay of /ii+i,i with respect to the value of a. 



Proposition 2 Let Oj ~ 0{e "■'). Then, for m —> oo 

1/ 



■j-fm \J-/™ rnc,a + 2,^fl\ , ^ 

n,^/^+i.) <fce-^ + — +o(™), (15) 

where k is a constant independent of m. 



Proof. Let fc be a constant such that dj < ke "^ . Then for p > 



(cf ([13])). Now consider the approximation 

which is fairly accurate for p « 0. Using this approximation in (J12p . we find that the minimum of 



?7(p)e 



i/p 



m 
is attained for p* = -^. Using this value, the bound ([TBI) , and defining t := — , we obtain 

1 - e~"P' TO 



7 m I "^Q^ 1 

fc exp In 



i 



i Vl-e"*e 
^ fc™cxp('^r-l + t('i + i')+0(i2)')') fori^O 

= fc^exp I -^^^ +TO f " 1+0(1)] foryn^oo. 

The result immediately follows from (ITil) and p^ . ■ 

In Figure [T] (a)-(b) we experimentally test the bound (fTS)) working with test problems SHAW 
and WING, taken from Hansen's Regularization Toolbox [15]. For these two problems it is known 
that a — 2 and a = 4.5 respectively. 

In the following results we assume to work with problems in which the discrete Picard condition 
is satisfied, that is, u^b = 0{am), where Um denotes the TO,-th column of U, and b is assumed to 
be the exact right-hand side. 

Proposition 3 Assume that the singular values of A are of the type Uj — 0{e^°'^). Assume 
moreover that the discrete Picard condition is satisfied. Let Vm '■= [vq, ■■■,Vm-i] G jR^x™ where 
Vk '■= A'^b/ uA^^bu. If Vm has full column rank, then there exists Cm G M™^™ nonsingular, 
Em^FmeR^''"', such that 

Vm = UmCm+Em, P™ || = 0(V^Cr,„) , (17) 

Um = VmCm^+Fm, \\Fm^m\\ = 0{^am) ■ (18) 



Proof. Let U^ := [u^+i, ..., ujv] G ^(N-va)xni^ Writing c^o) = C/^uo e K" and e(°) 

T, 



C^m (C^m)^ ^0 e M", we have 
The Picard condition impUes 

since aj = 0(e^"^) and then using 



,(0) 



Fo = C/„c(°)+e(o). 



(C/^) ^0 =0(a„,), 



1/2 






I J?'>?7T.+ 1 



(19) 



From the relation \\A— t/mE„iy^ = cr„i+i, after some computation one easily finds that for 
< fc < m- 1, 

where c''') = U'^Vk € M" and 

Ak) _ IP "II yt^(fc-l) I ^/ N 

~ ||A'=6|| '+0»(o-,„+i), 

so that lleC^'ll =0(0-™). Defining C^ = [c("\ ..., c("-i)] = f/^Kx eM™^"andS„ = [e^o', ...,e(™-i)] € 
R^x™ we have proved (HZJ). 



By (|17p . we can write 



and since Em ~ U^ (C^m) ^m we have that 



-t^m^m — 



ive that 



(20) 
(21) 



Now observe that (|17p implies 



(Uri) Vm = 



I 0(cr,„+i) •• • 0(cr,„+i) 



{N—m)xm 



and 






OiaN) 



UlVm = 



0{am) ■■■ 0{Gr, 

Using the Cramer rule to invert t/^14i we find that each entry of yJ^^ra) ^m, G 
the type 0(1), and hence 

Defining F„ = -E^C-^ we obtain ^ by ^, ^ and ^, and applying ([TO)) . 



©(fm+i) ••• 0(cr„,+i) 



^rn 



(N—va)xm 



is of 



(22) 



Remark 4 The hypothesis aj = 0(e~"^) of Proposition\^is just used to have ||e^°^|| = 0{am) by 
(f jff|]. The result of the proposition can be extended to work with moderately ill-posed problems, in 
which aj = 0{j~"), provided that a is large enough. As consequence in this situation we would 
have a slower decay of ||£'jri|| and ||i^„iS„i||. 

The following result improves the one of Theorem [T] (which holds without hypothesis on b) . 

Proposition 5 Under the hypothesis of Proposition\^ 



hm+l,rn = 0{y/mam). 



Proof. By 



^m+l,m = 'wJ^^+^AWr, 






since ||A„j|| = cr„,+i. Therefore, using p^ we obtain 



which concludes the proof, since w'^_^_iVm — and ||iv„S„i|| = 0{y/mam). ■ 

In Figure[T](c)-(d) we compare the decay of the sequence {/im+i,m}„>x with that of the singular 
values, working again with the test problems SHAW and WING. 
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Figure 1: (a)-(b) decay rate of (11^=1 ^i+i,i) " (dash-dot line) and hound !il5\) (solid line), (c)-(d) 
decay of hm+i,m and am- On the left the results for SHAW and on the right the results for WING. 
In each experiment N — 32. 

We need to remark that the results of Figure [T] are obtained working with the Householder 
implementation of the Arnoldi algorithm and hence simulating what happens in exact arithmetics. 



4 The approximation of the SVD 



The use of the Arnoldi algorithm as a method to approximate the marginal values of the spectrum 
of a matrix is widely known in literature. We may refer to |28[ Chapter 6] for an exhaustive 
background. Using similar arguments, in this section we analyze the convergence of the singular 
values of the matrices Hm to the largest singular values of A. For the Lanczos bidiagonalization 
method [1] [26] , the analysis can be done by exploiting the connection between this method and the 
symmetric Lanczos process (see e.g. [8]). The use of the Lanczos bidiagonalization to construct 
iteratively the GSVD of {A, L) has been studied in [17]. 

Let us consider the SVD factorization of i?™, that is, i?„i = t/(™)i](™)yMT^ jj(m) ^ ^(m+i)x(m+i) ^ 



yim.) ^^nixm ^^^ 



( -I 



(m) 



J]("i) 



V 



'm 

) 



ri(m.+ l) xni 



We can state the following results. 



Proposition 6 Let Um+i = Wm+iW-"'^ E RNx{m+i) ^^^ y^^ ^ W™t/('") e 



pAfxm 



Then 



Proof. Using ([4]), we have 



(m)-pT 






A-^W^™W^^. 



Observe that since C7„+iE(™) = W^™+iC/(™)S('"), where S(") e R^xm jg j^g^ ^.(m) without the 
last row, and U^"^^ € R(m+i)xm jg jj(m) ^j^i^out the last column, the above result states that the 
triplet fw^+if/^"), S(™), W„, !/(")] defines an approximation of the truncated SVD of A, which 

cannot be too bad since ||^(^ — ^ml^m) 11 — ll^ll- Moreover, it states that if the Arnoldi algorithm 
does not terminate before N iterations, then it produces the complete SVD. The following result 
gives some additional information. 



,(m) 



„(™) 



Proposition 7 Let Uj^ G ]R™+^ and v^ G M™ he respectively the right and left singular vectors 
relative to the singular value <J^ of Hm, that is, 7J,„i;j;,'" — cr)^ u"^ and H^u^^ — (^^ "^k ' 
with 1 < k < m. Then defining Uk — Wm+iUj^ and Vk = WmV]^ we have that 



Avk - (^k Uk = 0, 



W:^{A^Uk-arvk) 



(™)t; 



(23) 
(24) 



Proof. (P5|) follows directly by ^. Moreover, since 

^mWfc -O^. V^ -U, 



using i/„j = W^A^Wm+i, and the definition of Ufe and Vk, we easily obtain (f24| 

Remark 8 Using the square matrix Hm to approximate the singular values of A, that is, computing 

the S 

then 



the SVD H„, = u{r>r)Y,{rrL)y{rn)T^ ^^^^^ ^^^ jj(-m.) ^Y^(-m) ^y{va) g Rmxm^ if E^v''^'^ = 4"^ 4""^ 



Avk - al 'uk 



<hm+l.m with Uk^WmU"^ ,Vk^WmV 



(m) 
k ■ 



(25) 



The above relation is very similar to the one which arises when using the eigenvalues of Hm 
(the Ritz values) to approximate the eigenvalues of A \28\ ^6.2]. Note moreover that whenever 
hm+i,m ~ 0, and hence very quickly for linear ill-posed problems (see Section 3), the use of Hm 
or Hm is almost equivalent to approximate the largest singular values of A. 

The Galerkin condition (1^^ is consequence of the fact that the Arnoldi algorithm does not work 
with the transpose. Obviously, ii A = A^ , the algorithm reduces to the symmetric Lanczos process 



and, under the hypothesis of Proposition [71 we easily obtain A^Uk ~ o'j! 
case oi A y^ A^ , Proposition [7] ensures that since Vk 



(m) 



Vk 



Arn) 



{m) 



WrnvT' e lCm{A,b), by dMl the vector 



(^k ^fc is J^st t^^ orthogonal projection of A Uk onto )Cm{A, b), that is, a 
which implies 



(m) 



Vk 



0. In the general 
3 vect 

WmW^A^Uk, 



A^' 



Uk 



(m). 



Vk 



<\\{I-WmWl)A^WmWl 



(26) 



This means that the approximation is good if A^u^ is close to /C„j(yl, 6). It is interesting to observe 
that (pS)) is just the "transpose version" of (I^Sj) since 

hm+l.,n = \\{I- WmWl)AWmWl\\ , 

which can be easily proved using again ([3]) (cf. [28, Chapter 4]). 

Experimentally, one observes that the Arnoldi algorithm seems to be very efficient for approx- 
imating the largest singular values for discrete ill-posed problems. In order to have a-posteriori 
strategy to monitor step-by-step the quality of approximation, we can state the following. 



Proposition 9 Assume that the matrix A has full rank. Then 



A Uk- (Tk Vk 



< WW'^^.AW 

— \\** TTl-f 1^-*-' ' ? 



+\-^y' m II ) 



(m) 



(27) 



where Uk, Vk, fj,™ are defined as in Proposition^ and Wm — [wm+i, ■■■,wn]- 



Proof. Since Vk G /C™(^,&), and Uk = W^m+i< ^ by ^ 

A^uk-ai-'hk < {W^fA^Wm+i 



(28) 



Formula (|27p is rather interesting because since hij = wf Awj from the Arnoldi algorithm. 



** m-t-l^-*-*^' rn 



hi. 



m+l 



hi. 



N 



''m-{-l.m-\-l 



Im+L.N 



Since in many cases the elements of the projected matrix Hm tends to annihilates departing from 
the diagonal (this is the basic assumption of the methods based on the incomplete orthogonaliza- 
tion, see e.g. [29]), one may obtain useful estimates for the bound ([27)) working with few columns 
of Wm_^_iAWm, that is, with few columns of Wm, and hence obtaining a-posteriori estimates for 
the quality of the SVD approximation. In order to have an experimental confirmation of this 



10 



A-C/^+iSM]/, 



and WWl^.A 



m+l^ 



^m+1 , 



for 



statement, in Figure [H we show the behavior of 

some test problems. Note that || W^_|.]^Amm+i|| comes from the bound (|27p with W:^ replaced by 

Wm+l- 

We remark that Proposition [5] and [5] can be used to arrest the procedure whenever the noise 
level e is known, since it is generally useless to continue with the SVD approximation if we find 
"k^ << e, for a certain k and m. Indeed, in this situation the Picard condition is no longer 
satisfied since typically f/^ « e for m large enough. 

For what concerns the generalized SVD of the matrix pair [A, L), let AX — US and LX = VC, 
where S — diag{si, ...,sn) and C = diag{ci, ...,cn), X e R^^^ is nonsingular and C/, F G R^^-'^ 
are orthogonal. Moreover let i?„X(") == t/(™)5'('«) and L„X(") = y(™)c(™), where S'^") = 
diag{sl^ , ...,Sm ) and C'-™-' = diag{c]^ , ...,Cm ), be the generalized SVD of the matrix pair 
{Hm.Lm). In this situation, for the convergence of the approximated generalized singular values 
and vectors, we can state the following result. 
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Figure 2: Decay behavior of 



A-c/™+iE(™)y„ 



(solid line) and lower hound nW^j^iAw< 



m+l 



arising from Proposition\M (dash-dot line) for BAART (a), WING (b), SHAW (c) and LLAPLACE 
(d). The dimension of each problem is N = 32. 

Proposition 10 Let u}." , w^" and xj,™ be the k-th column of the matrices C/(™) e ^(m+i)xm^ 
yim) g j^mxm and X^"''> e R™x™ respectively. Then defining Uk = W,„+iu^"\ Vk = VK™^™^ and 
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Xk = WmXl^ , we have 



Axk 



Wi{Lxk-cr'vk) 



(m) 



Uk 



(29) 
(30) 



Proof. Similarly to Proposition [71 (P^ and (I5U1) follows immediately from the basic relation 
©• ■ 

As before the proposition ensures that if the matrix A has full rank, than the Arnoldi algorithm 
allows to construct the GSVD of {A,L). Step by step, the quality of the approximation depends 
on the distance between span{Lwi, ...,Lwm\ and Km{A,b). Similarly to (pS)) and (pS)) . since 
Vk = Wrav)!^' € lCm{A, b) , wc have 



J-— (™)— 

^^Xk - C) 'Vk 



<\\{I-WraWl)LWraWl 



and 



Lxk - al 'vk 



< 



{W^ 



LW„ 



In Figure [3] we show the convergence of the singular values of Hm, and the generalized singular 



values of the matrix pair lHm,Lm), with 



L = 



/I 



Vo 



-1 
/ 



working with the test problems SHAW and BAART. The results show that the approximations are 
quite accurate. It is interesting to observe that, in both cases, after 8-9 iterations the algorithm 
starts to generate spurious approximations. This is due to the loss of orthogonality of the Krylov 
vectors, since in these experiments (and in what follows) we have used the Gram-Schmidt imple- 
mentation. Working with the Householder version of the algorithm the problem is fixed. Anyway 
in the framework of the regularization, a more accurate approximation of the smallest singular 
values is useless because of the error in b. 
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BAART - Singular Value; 



BAART - Generalized Singular Values 
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Figure 3: Plots of the singular values (circle) of the matrix Hm (left) and the generalized singular 
values of the matrix pair {H„i, im) ("right) versus the iteration number k, for the problem BAART 
and SHAW with N = 32. The solid lines represent the singular values of the matrix A (left) and 
the generalized singular values of the matrix pair {A, L) (right). 



5 Generalized Cross- Validation 

A popular method for choosing the regularization parameter, which does non require the knowledge 
of the noise properties nor its norm ||e||, is the Generalized Cross- Validation (GCV) [HIISS]- The 
major idea of the GCV is that a good choice of A should predict missing values, so that the model 
is not sensitive to the elimination of one data point. This means that the regularized solution 
should predict a datum fairly well, even if that datum is not used in the model. This viewpoint 
leads to minimization with respect to A of the GCV function 



G(A) 



\b-Axx\ 



[traceil - AAxW 

where A\ — {A^A + X^L^L)^^A^ is the matrix that gives the regularized solutions of © from 
the normal equations 

{A^A + X^L^L)xx = A^b. 
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Using the GSVD of the matrix pair {A,L), with a general A e IR^^^,L G M^^^, let 
A = USX~^ and L = VCX~^ , where S = diag{si, ..., sp) and C = diag{ci, ..., cp), the generalized 
singular values 7^ of (A, L) are defined by the ratios 

Si 

7i = — , 1 = 1,...,F. 

Therefore, one can show that the expression of the GCV function is given by 



EL( 



A „,T 



it+x^ 



u: 



b 



2 



G(A) = '-^ y , ., . (31) 

(M-(iV-P)-Er=i:^ 

For the square case M — N, and P < N, rearranging the sum at the denominator we obtain 



v-JV / A 



.,T 



G(A) = : '^'^""'^ ;/ . (32) 






The GCV criterion is then based on the choice of A which minimizes G(A). It is well known that 
this minimization problem is generally ill-conditioned, since the function G(A) is typically flat in 
a relatively wide region around the minimum. As a consequence, this criterion may even lead to a 
poor regularization [20l [22l [30] . 



As already said in the Introduction, for large-scale problems the GCV approach for ([2]) is too 
much expensive since it requires the SVD (GSVD). In this setting, our idea is to fully exploit the 
approximation properties of the Arnoldi algorithm investigated in Section 3 and 4. In particular, 
our aim is to define a sequence of regularization parameters {Am}, i-e., one for each iteration of the 
Arnoldi algorithm, obtained by the minimization of the following GCV function approximations 

G™(A) = . F-^-^^-IHIeif (33) 



[N-m + YZi 



A2 



tI-'Va^ 



where ym,\ solves the reduced minimization (jlOp . and 7)™ , i = 1,...,to, are the approximations 
of the generalized singular values, obtained with the Arnoldi process. Note that 

IITr llMl iP \^ ( A^ (™)^ ^ , ( (™)^ \^ 

||-ffmym,A-||fe||ei|| ^2^\-!z;^ —u\ c +(m™+ic1 , 

where ui'' is defined as in Proposition [TUl and c = ||6||ei, so that the construction of Gm(A) 
can be obtained working in reduced dimension. The basic idea which leads to the approximation 
Gm(A) « G(A), is to set equal to the generalized singular values that are not approximated by 
the Arnoldi algorithm, and that are expected to be close to after few iterations. This is justified 
by the analysis and the experiments of Section 3 and 4. 

We remark that in a hybrid approach fl8l, one aims to regularize the projected problem 

min {||i?my-||&||ei||}. (34) 
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Since no geometrical information on the solution of (p4)) can be inherited from the solution of the 
original problem, the choice of Lm = Im as regularization operator is somehow forced (this is a 
standard strategy for hybrid methods [Mj §6.7]). In this framework, if the GCV criterion is used 
to regularize (|34|) . the basic difference with respect to (|33|) is at the denominator, where N — m 
is replaced by m. We observe moreover that p3p is similar to the GCV approximation commonly 
used for iterative methods, in which the denominator is simply N — m pL4, §7.4]. 

In the following, the algorithm that has been used for the tests of the next sections. 



AT - GCV Algorithm 

given AeM^><^, 6 e M^, 5 
while I ||r„^- ||r„_i|| |/ ||r™|| > 5 

update Hjn and L™ from ^ and (JH) 

compute GSVD(iJ„i,Lm) 

compute Am — argmiuA Gm(A) 

f H„, \ ( \\h\\ei 

solve mmj^^gM™ \ ^ t \Vra~\ q 

compute the corresponding residual Vm 
end 

compute Xra = Wmllm 

The stopping rule used in the algorithm is just based on the residual. As an alternative, one 
may even employ the strategy adopted in [4], based on the observation of the GCV approximations. 



6 Numerical results 

In order to test the performance of the proposed method, we consider again some classical test 
problems taken from the Regularization Tools [15 . In particular in Figures |4][5l we consider the 
problems BAART, SHAW, FOXGOOD, LLAPLACE, with right-hand side affected by 0.1% or 1% 
Gaussian noise. The regularization operator is always the discretized first derivative, augmented 
with a zero row at the bottom to make it square (cf. (HJ). For each experiment we show: (a) the 
approximation of G{X) obtained with the functions Gm(A) for some values of m, with a graphical 
comparison of the local minima; (b) the approximate solution; (c) the relative residual and error 
history; (d) the sequence of selected parameters {Xm}, with respect to the one obtained with the 
minimization of G(A) (denoted by Xa in the pictures) and the optimal one (Aopt) obtained by the 
minimization of the distance between the regularized and the true solution [25] 



where 



min \\xreg - XtrueW =min/(A), 



N 



/WHEI7:;;^?--E(«r^)- 



(7f + A2) ^. 



=p+l 
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Figure 4: Results for BAART (top) and SHAW (bottom). The dimension of each problem is 
N = 120. Noise level e = 10"^ 
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Figure 5: Results for FOXGOOD (top) and I_LAPLACE (bottom). The dimension of each problem, 
is N ^ 120. Noise level e = 10"^. 



7 An example of image restoration 

We conclude with an illustration of the performance of the GCV-Arnoldi approach on a 2D image 
deblurring problem which consist in recovering the original n x n image from a blurred and noisy 
observed image. 

Let X be a n X n two dimensional image. The vector x of dimension N = n^ obtained by 
stacking the columns of the image X represents a blur-free and noise-free image. We generate an 
associated blurred and noise-free image b by multiplying x by a block Toeplitz matrix A e M^^^ 
with Toeplitz blocks, implemented in the function blur .m from the Regularization Tools 15 . This 
Matlab function has two parameters, band and sigma; the former specifies the half-bandwidth of 
the Toeplitz blocks and the latter the variance of the Gaussian point spread function. The blur and 
noise contaminated image b £ M.^ is obtained by adding a noise- vector e € M^, so that b — Ax + e. 
We assume the blurring operator A and the corrupted image b to be available while no information 
is given on the error e, we would like to determine a restoration which accurately approximates 
the blur-free and noise-free image x. 

We consider the restoration of a corrupted version of the 256 x 256 test image mr i . png. Contam- 
ination is by 1% white Gaussian noise and space-invariant Gaussian blur. The latter is generated 
as described above with blur parameters band=7, sigma=2, so that the condition number of A 
is around 10^^. Figure [6] displays the performance of the AT-GCV. On the left the blur- free and 
noise-free image, on the middle the corrupted image, on the right the restored image. From top to 
bottom the image in original size and two different zooms. The regularization operator is defined 
as (cf. 0) 

i = /„®Li+Li(g)/„eM^^^, 

where Li G M"^" is the discretized first derivative with a zero row at the bottom (cf. also [171 

§5]). 
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Original Image 



Blurred and noisy Image 



Restored image 












Figure 6: Restoration of mri.png image 
e ~ 10^^ and blur parameters band=7, 
and two zoom. 



Original image, blurred and noisy image with noise level 
Sigma— 2, restored image. From top to bottom original size 



The experiment has been carried out using Matlab 7.10 on a single processor computer (Intel 
Core i7). The result has been obtained in 5 iterations of the Arnoldi algorithm, in around 0.5 
seconds. Many other experiments on image restoration have shown similar performances. 
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8 Conclusion 

The fast convergence of the Arnoldi algorithm when apphed to compact operators makes the AT 
method particularly attractive for the regularization of discrete ill-posed problems. The projected 
problems rapidly inherit the basic features of the original one, allowing a substantial computational 
advantage with respect to other approaches. 

In this paper, in absence of information on the noise which affects the right-hand side of the 
system, we have employed the GCV criterion. Contrary to the hybrid techniques, the sequence of 
regularization parameters {Am}m>i is defined in order to regularize the original problem instead 
of the projected one, leading to GCV approximations which are similar to the ones used for 
pure iterative methods ([TH §7.4]). Notwithstanding the intrinsic difficulties concerning the GCV 
criterion, the arising algorithm has shown to be quite robust. Of course there are cases in which the 
method fails, but the numerical experiments presented are rather representative of what happens 
in general. 
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